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Abstract 

This  work  addresses  the  problem  of  non-preemptively  scheduling  a  cyclic  set  of  interdependent 
operations,  representing  for  instance  a  program  loop,  when  p  identical  processors  are  available.  For 
p  =  oo  we  give  a  simple,  efficient,  polynomial  time  algorithm  producing  optimum  results.  When 
p  <  oo  the  problem  becomes  NP-hard  and  a  slight  modification  of  our  algorithm  generates  provably 
close  to  optimum  results. 


entities  will  be  denoted  by  greek  literals. 

Let  [0,00]  denote  the  set  of  non-negative  integers  and  O[0,cx)]  the  set  product  of  O  and 
[0,00],  that  is  O[0,oo]  =  {op[i]  :  op  E  O  and  i  G  [0,oo]}.  The  overall  goal  is  to  construct 
a  rational  schedule,  that  is  a  mapping  a  from  O[0,oo]  into  the  non-negative  rationals,  which 
model  time  instants,  such  that 

1.  a  is  periodic 

3  /  >  1    3  r,,  >  0    V  f  >  0       CT(op[i])  =  a{op[i  mod  /])  +  r,.  •  \-\ 

2.  (7  has  optimum  asymptotic  performance  ||ct||  =  r,,// 

3.  a  satisfies  G's  dependence  constraints 

V  e  =  {op,  op')  £  E    V  i  >  0       (T{op[i])  +  6{op)  <  a{op'[i  +  d{e)]) 

4.  (7  can  be  executed  by  at  most  p  processors 

Vr>0       \{op[i]  eO[0, 00]  :  0<T -a(op[i])<  S(op)}\<p 

A  rational  schedule  which  maps  C>[0,cx)]  into  the  non-negative  integers,  is  said  to  be  integral. 
A  more  formal  definition  of  the  model  will  be  given  in  sections  2  and  3. 

A  central  problem  in  unveiling  periodic  schedules  with  asymptotically  optimum  perfor- 
mance is  determining  the  maximum  duration-to-weight  ratio  of  the  cycles  in  G  [18].  More 
precisely  let  P  be  a  path  in  G  traversing  operations  op\,...,opk  and  edges  ei,...,ec,  where 
c  =  /:  if  f*  is  a  cycle  and  c  =  k  -  \  otherwise.  If  one  defines 

k  c 

S{P)  =  Y,Hop,)    and    d(P)  =  ^d(e.) 
t=i  i=i 

then  the  maximum  duration-to-weight  ratio  of  the  cycles  in  G,  which  we  denote  pc,  is 

6(C) 

in  G 

If  G  is  acyclic,  pa  is  zero  by  definition.  When  8{op)  =  1  for  every  operation  op,  i/PG  is  called 
the  minimum  cycle  mean  of  G.  Efficient  algorithms  for  the  minimum  cycle  mean  problem  have 


been  given  by  Karp  [15],  Ahuja  &  Orlin  [1]  and  Young,  Tarjan  &  Orlin  [23].  Let  n  and  m  respec- 
tively denote  the  number  of  operations  and  dependence  edges  in  G  and  dmax  =  maxeg£;d(€). 
Karp's  algorithm  runs  in  0(nm)  time,  Ahuja  &  Orlin's  runs  in  O(y/nmlog{ndmax))  and 
Young,  Tarjan  &  Orlin's  runs  in  0{  nm  +  n^  logn)  time  in  the  worst  case  but  experiments 
on  random  graphs  have  suggested  that  its  expected  running  time  is  0(m  +  n  logn).  All  min- 
imum cycle  mean  algorithms  can  be  extended  to  compute  pc  in  the  case  where  operations 
have  arbitrary  integer  durations  by  adding  an  additional  0{\og{n Smax dmax))  factor  in  time 
complexity,  where  Smax  =  maxopgo^(op)  [11].  The  general  case  where  operation  durations  are 
positive  rationals  can  be  reduced  to  the  integer  case  by  multiplying  every  6{op)  by  Icm,  the 
least  common  multiple  of  the  denominators  of  the  S{op).  The  po  obtained  by  this  transforma- 
tion must  be  divided  by  Icm  in  order  to  obtain  the  actual  maximum  duration-to-weight  ratio 
of  the  cycles  in  G.  The  algorithm  of  Young,  Tarjan  &  Orlin  can  also  be  used  directly  in  the 
general  case  when  operations  have  rational  durations.  In  that  instance  the  worst  case  running 
time  increases  to  0{dmaxi  nm  +  n^  logn)).  The  expected  running  time  should  also  increase 
but  by  a  much  smaller  factor  than  O(dmax)-  Note  that  to  improve  the  running  time  of  the 
previous  algorithms  in  the  case  where  pa  >  Smax  one  can  initially  delete  all  dependence  edges 
e  such  that  d{e)  >  n.  This  guarantees  that  dmax  <  n. 

The  first  problem  studied  (section  4),  is  to  efficiently  generate  a  periodic  schedule  with 
optimum  asymptotic  performance  when  there  is  an  unbounded  number  of  processors  available, 
that  is  p  =  oo.  Previous  work  by  Iwano  &  Yeh  assumes  integral  schedules  and  integral  operation 
durations  [14].  They  give  an  0(nmlQ  +  T)  pseudo- polynomial  time  algorithm  where  Iq  is  the 
denominator  of  pc  in  its  irreducible  form  and  T  the  time  to  compute  pa-  Note  that  Iq  can 
be  as  big  as  ndmax-  We  give  a  simple  0{T)  time  algorithm  for  the  general  case  of  rational 
schedules  and  rational  operation  durations.  When  the  schedule  and  operation  durations  are 
required  to  be  integral  the  algorithm  generates  periodic  schedules  a  factor  Ipg]  I PG  away  from 
the  asymptotic  optimum.  A  slight  modification  of  the  algorithm  generates  in  0{nmlQ  ■\-  T) 
time,  asymptotically  optimum  results. 

When  the  number  of  processors  p  is  finite  the  problem  of  generating  an  asymptotically 
optimum  periodic  schedule  becomes  NP-hard.  Our  second  contribution,  section  5,  is  to  ef- 
ficiently generate  near  optimum  schedules.    Most  previous  work  in  this  domain  has  been  of 


empirical  nature  and  performance  bounds  have  solely  been  validated  by  benchmarking.  Our 
algorithm  is  simple,  runs  in  0{T)  time  and  guarantees  a  maximum  factor  from  optimality  of 
(2-  l/p)  +  (p—  l)/p-6jnai/\\mp,  where  \\L\\p  denotes  the  optimum  asymptotic  performance  for 
the  input  cyclic  task  system  L  when  p  processors  are  available.  A  better  performance  bound 
of  (2  -  1/p)  +  (p—  l)/p-^max/('  •  ll-^llp)i  where  /  is  a  user  selected  parameter,  can  be  obtained 
at  an  additional  0{nml^)  cost  in  time  complexity.  If  operation  durations  are  integral  and 
the  generated  schedule  is  also  required  to  be  integral  a  slight  modification  of  the  algorithm 
guarantees  a  worst  case  performance  bound  of  1  +  (p  -  l)/p  •  {{^max  -  1)  +  TII-'^IIpI  )/II-^IIp- 

2      Task  Systems  and  Admissible  Schedules 

We  first  introduce  task  systems  [5,12].  Informally  a  task  system  is  a  collection  of  several 
interdependent  operations  all  of  which  must  be  executed  in  order  to  complete  the  task. 

Definition  1   A  task  system  T  is  a  triple  T  =  {0,6,  -<)  where: 

1.  O  is  the  operation  set  of  T ,  a  non  necessarily  finite  set  of  operations. 

2.  6  is  the  duration  function  ofT,  a  function  mapping  O  into  the  positive  rationals. 

3.  ■<  is  the  dependence  relation  of  T ,  a  partial  order  on  O. 

If  O  is  finite  then  T  is  said  to  be  acyclic  otherwise  T  is  said  to  be  infinite. 

The  machine  model  considered  comprises  p  identical  processors  operating  in  parallel.  There 
is  no  preemption:  once  started,  an  operation  has  to  be  executed  without  interruption.  Given 
some  task  system  T  we  formalize  the  notion  of  a  rational  schedule  a  for  T. 

Definition  2  Let  T  =  (0,6,<)  be  some  task  system.  An  admissible  p-schedule  a  for  T  is  a 
mapping  from  O  into  the  non-negative  rationals  such  that: 

1.  No  more  than  p  operations  are  being  processed  at  any  given  moment: 

V  r  >  0       \{op[i]  e  O[0,oo]  :  0  <  T  -  (7{op[i])  <  6{op)}\  <  p 

2.  No  operation  can  start  executing  until  all  operations  on  which  it  depends  have  completed: 

V  op,  op'  eO     op  ^  op'  ^   (T{op)  +  6{op)  <  a{op) 


If  the  schedule  a  maps  O  into  the  non-negative  integers,  a  is  said  to  be  integral.  When  the 
task  system  T  is  acyclic  one  defines  \a\,  the  length  of  a,  as  \a\  =  maXopgo(<7(op)  +  S{op)).  If 
no  admissible  p-schedule  for  T  has  a  length  smaller  than  \a\,  a  is  said  to  be  p-optimum  for  T. 

For  any  finite  p  >  1,  the  problem  of  generating  a  p-optimum  schedule  for  an  acyclic  task 
system  is  NP-compIete  [19].  If  operations  are  restricted  to  have  same  duration  a  polynomial 
time  optimum  algorithm  for  the  case  p  =  2  was  presented  by  Coffman  and  Graham  and  an 
almost  linear  time  algorithm  was  given  by  Gabow  [6,9].  The  problem  remains  open  for  any  fixed 
p  >  3  that  is,  NP-hardness  has  not  been  proved  or  disproved  [10].  If  however  p  is  considered 
to  be  a  parameter  the  problem  becomes  NP-hard  [19]. 

The  algorithms  of  Coffman  &  Graham  and  Gabow  build  on  the  list  scheduling  framework 
[5].  List  scheduling  algorithms  work  as  follows.  The  operations  in  the  acyclic  task  system  are 
implicitly  ordered  in  a  priority  list.  At  any  given  instant  where  a  processor  is  free  an  opera- 
tion is  scheduled  by  selecting  the  first  operation  in  the  priority  list  aU  of  whose  predecessors, 
with  respect  to  the  dependence  relation,  have  finished  executing.  Note  that  for  p  =  oo  any 
list  scheduling  algorithm  yields  optimum  schedules.  If  p  <  oo  any  list  scheduling  algorithm 
guarantees  a  schedule  length  of  at  most  (2  -  1/p)  times  the  p-optimum  [5].  The  NP-hardness 
proof  given  by  Lenstra  &  Rinnooy  Kan  [19]  impUes  that,  unless  P  =  NP,  no  polynomial  time 
algorithm  can  approximate  p-optimality  for  arbitrary  p,  by  less  than  a  factor  4/3. 

3      Cyclic  Task  Systems  and  Periodic  Schedules 

The  following  definition  models  the  behavior  of  a  system  which  must  continuously  execute  a 
fixed  set  of  interdependent  operations. 

Definition  3  A  cyclic  task  system  L  is  an  infinite  task  system  L  —  (O[0,  oo],^,  X)  where: 

1.  O[0,  oo],  the  operation  set  of  L  is  the  product  ofO,  a  finite  set  of  operations,  and  [0,  ooj. 

2.  For  all  op  E  0  and  i,j  >  0,  i(op[z])  =  S{op[j]).  We  will  denote  such  number  S{op). 

3.  For  all  op,  op'  G  0  and  i,j  >  0,  op[i]  -<  op'[j]  implies  i  <  j.  Furthermore  let  d{op,op')  = 
min  {j  —  i  :  op[i]  •<  op'[j]},  where  the  minimum  of  the  empty  set  is  equal  to  oo  by  definition. 
Then  if  d{op,op')  ^  oo  it  is  required  that  for  all  i  >  0,  op[i]  ~<  op'[i  +  d(op,op')]. 


The  cyclic  task  system  L  =  (O[0,oo],^,  -<)  will  be  portrayed  by  a  doubly  weighted  directed 
graph  G  =  {0,E,S,d)  called  L's  dependence  graph.  G's  vertex  set  is  O.  To  each  vertex  op  of 
G  we  associate  its  duration  S(op).  6"s  edge  set  E  must  verify  the  following  two  requirements: 

1.  If  e  =  {op,  op')  £  E  then  d{op,op')  <  oo.  The  weight  of  e  is  set  to  d{op,op'). 

2.  ^  op,op'  €  O      d(op,op')  <  oo    =>     3  path  P  from  op  to  op'  s.t.  d{P)  =  d{op,op'). 
Note  that  f^  is  not  necessarily  unique.  The  edge  set 

E  =  {{op,  op')  :  d{op,op')  <  oo} 

has  the  biggest  cardinaUty,  whereas  the  edge  set 

E  =  {{op,  op')  :  d{op,  op')  <  oo  and    ^  op"    d{op,  op")  +  d{op",  op')  <  d{op,  op')} 

is  the  edge  set  with  the  smallest  cardinality.  By  computing  the  all  pairs  shortest  paths  of  G 
it  is  easy  to  transform  the  original  edge  set  into  the  one  with  the  least  amount  of  edges.  This 
step  can  be  implemented  in  0{nm  +  n^  logn)  time,  where  n  and  m  respectively  denote  the 
cardinality  of  G's  vertex  and  edge  set  [7]. 

For  cyclic  systems  one  is  interested  in  generating  regular  schedules  which  can  be  finitely 
encoded.  We  introduce  the  notion  of  a  periodic  schedule. 

Definition  4  Let  L  —  (O[0,  oo],^, -<)  be  some  cyclic  system,  a  an  admissible  p-schedule  for 
L  and  /  >  1,  r„  >  0  respectively  an  integer  and  a  rational.  The  schedule  a  is  said  to  be 
{I,  Tii)-periodic  for  L  if  and  only  if 


y  op  e  O    V  i  >  0      (^{op[i])  =  a{op[i  mod  /])  +  r„ 


The  numbers  I  and  r„  are  respectively  called  the  unfolding  and  initiation  interval  of  a.    The 
asymptotic  performance  of  a,  denoted  ||a||,  is  defined  as  \\a\\  =  r,,//. 

Note  that  a  (/,  r„)-periodic  schedule  is  perfectly  determined  by  the  initiation  interval  r„ 
and  the  time  in  which  operations  in  the  first  /  iterations  start  executing. 

When  there  is  an  unbounded  number  of  processors,  that  is  p  =  oo,  we  show  in  the  next 
section  how  to  efficiently  construct  a  periodic  schedule  with  asymptotically  optimum  perfor- 
mance.  When  p  <  oo  let  \\L\\p  denote  the  p-optimum  asymptotic  performance  for  the  input 

6 


cyclic  task  system  L.  Because  of  Lenstra  &  Rinnooy  Kan's  result  [19]  it  will  be  shown  in 
section  5  that  no  periodic  p-schedule  a  admissible  for  L  and  satisfying 

can  be  constructed  in  polynomial  time  unless  P=NP.  We  will  however  provide  an  efficient 
algorithm  that  generates  periodic  p-schedules  at  most  a  factor  (2  —  1/p)  +  {p—  l)/p-^max/||-t||p 
away  from  \\L\\p. 

4     Achieving  Optimality  for  Infinite  Processor  Machines 

In  this  section  we  examine  the  problem  of  generating  periodic  rational  schedules  which  are 
asymptotically  optimum  when  the  number  of  processors  p  is  infinite.  We  provide  a  very  simple 
and  efficient  algorithm. 

Algorithm  1 

Input:  A  cyclic  task  system  L  represented  by  a  dependence  graph  G  =  {0,E,b,d). 
Output:  A  (l,pG)-periodic  rational  schedule  a  for  L  with  optimum  asymptotic  performance. 
Method: 

1.  Compute  pa,  the  maximum  duration-to-weight  ratio  of  the  cycles  in  G,  with  any  of 
the  algorithms  mentioned  in  the  introduction.  This  step  takes  0{T)  time. 

2.  Associate  to  each  edge  e  =  {op,  op')  €  E  the  length  r(e)  =  8{op)  -  pc  ■  d(e).  Add  a 
vertex  s  to  G  and  the  edges  is,op)  for  each  op  G  O.  Set  the  length  of  these  edges 
to  0.  Because  of  the  definition  of  po  no  cycle  in  G  has  positive  length  with  respect 
to  r.  Thus  for  each  op  £  O  the  longest  path  from  s  to  op,  denoted  T{s,op),  is  well 
defined.  These  longest  paths  are  produced  as  side  results  of  Karp  or  Young,  Tarjan 
&  Orlin  algorithms.  Otherwise  they  can  be  computed  in  0(nm)  time  [7]. 

3.  For  all  op  £  O  and  i  >  0  set  a{op[i])  =  T{s,op)  +  po  •  i. 

The  overall  algorithm  requires  0(7")  time  if  Karp's  or  Young,  Tarjan  &  Orlin's  algorithms 
are  used  to  compute  po  and  0(n  m  -\-  T)  time  otherwise. 


Theorem  1  The  schedule  a  generated  by  algorithm  1  is  an  admissible  {I,  pa) -periodic  schedule 
for  L,  the  cyclic  task  system  represented  by  G.  Furthermore  a  is  asymptotically  optimum. 

Proof:  It  is  fairly  obvious  that  a  is  (l,^G)-periodic.  Let  C  be  a  cycle  in  G  comprising  edges 
(opi,op2),...,(opc,opi).  Then 

V  i  >  0      opi[i]  -<  op2[i  +  d{opx,op2)]  <  ■  ■  ■  <  op][i -\-  d{C)] 

thus  at  most  d{C)  iterations  can  be  executed  every  6(C)  cycles  and  consequently  no  schedule 
can  have  an  asymptotic  performance  better  than  pc-  It  remains  to  show  that  a  is  admissible 
for  Z,,  the  cyclic  task  system  represented  by  G.  This  is  true  if  and  only  if 

V  e  =  (op,  op')  e  E    V  i  >  0      (j{op'[i  +  die)])  -  a{op[i])  >  6{op) 

This  is  certainly  true  since  cr{op'[i  +  d{e)])  -  (T{op[i])  =  T{s,op')  +  pc  ■  d{e)  -  T{s,op)  and 
T{s,op')  -  T{s,op)  >  S{op)  -  pa  ■  d(e)  by  virtue  of  the  longest  path  inequalities.  D 

•^ V ' 

=  T(e) 

If  operation  durations  are  integral  and  the  schedule  is  also  required  to  be  integral,  algo- 
rithm 1  continues  to  work  if  in  steps  2  and  3  one  replaces  pa  with  \pg]  that  is  one  sets 
r(e)  =  S{op)  -  \pg]  ■  d{e)  in  step  2  and  <T(op[i])  =  T{s,op)  +  \pg]  ■  i  in  step  3.  The  proof 
that  the  generated  schedule  is  (1,  [pc]  )-periodic  and  admissible  for  G  is  unchanged  and  the 
ratio  to  the  asymptotic  optimum  performance  is  clearly  \pG^/PG•  To  generate  asymptotically 
optimum  integral  schedules  we  introduce  the  notion  of  unrolling. 

Definition  5  Given  a  dependence  graph  G  =  {0,E,6,d)  and  a  positive  integer  I  one  defines 
the  dependence  graph  G'  =  {0',E',6,d'),  where 

1.  O'  =  {op[i]  :  ope  O  and  0  <  i  <  1} 

2.  E'  =  {iop[t],op'[j])  :  e  =  [op.op')  G  E  and  j  =  (i  +  d{e))  mod  /} 

3.  Fore  =  {op[i],  op'[j])  G  £',  S{e)  = 


f  +  d{op,op') 


I 
The  graph  G'  is  said  to  be  obtained  by  unrolling  G  I  times 

The  key  result  concerning  C'  is 


Theorem  2   Let  G  be  some  dependence  graph  and  I  a  positive  integer  than  pQi  =  I  pa- 

Proof:  Let  C  be  some  cycle  in  G  comprising  edges  ei   =  {opi,op2),  62   =   {op2,op3),  ■■■. 
Cc  =  (opcopi).  Consider  the  cycle  C'  in  G'  going  through  operations 

opi[0],  op2[d{ei)  mod  /],  op3[((d(ei)  mod  /)  +  4^2))  mod  /],  •  •  •,  opJCE^i  d{ei))  mod  /], 

=  (d{ei)  +  d(e2))mod  I 

op,[diC)  mod  I],  .-.,  opSd{C)  +  E-Zl  die,))  mod  I],     •••, 

opi[((/  -  1)  ■  d{C))  mod  /],-■•,  op,[((l  -  1)  •  diC)  +  ZlZl  dici))  mod  /],opi[(/  •  d(C))  mod  /] 
Clearly  6{C')  —  I  •  S{C),  furthermore  because  for  any  two  integers  a  and  6 


+ 


(g  mod  I)  +  b 
1 


a  +  b 


we  have  d'{C')  =  [(/  •  d{C))/l\  =  d{C)  and  therefore  6{C')/d'iC')  =  /  •  b{C)ld{C)  which  in 
turns  implies  pQi  >  I  ■  pc- 

Conversely  consider  a  cycle  C'  in  G'  traversing  operations  opifij],  •  •  •,  opc[ic],  opi[ii]  where 

1  <  j  <  C  I'l+jmod  c  =  (ij  +  '^(opj,  Opi+jmod  c))  mod  / 

The  corresponding  cycle  C  in  G  which  goes  through  operations  opi,  •  ■  •,  opc,  op\  is  such  that 
6{C)  =  S(C').  In  addition 

ij  +  d(opj,opi+jmodc) 


d'iC)  =  E 


/ 


Because  t'l+jmodc  =  (ij  +  d{opj ,  opi+j^od  c))  mod  /,  d(C)  must  be  a  multiple  of  /  and  d'{C')  = 
d{C)/l  which  in  turn  implies  I  ■  pc  ^  Pg'-  '-' 

It  follows  that  to  generate  an  asymptotically  optimum  integral  schedule  a'  when  operation 
durations  are  integral  it  suffices  to  apply  algorithm  1  to  G'°,  rather  than  G,  where  Iq  is  the 
denominator  of  pc  in  its  irreducible  form.  More  precisely  the  schedule  cr'  is  defined  as 


^ op  £  O    Vz>0       a' {op[i])  =  T{s,op[i  mod  I g])  +  PqIc 


Because  p^'o  '^  ^"  integer,  a'  is  an  integral  schedule.  The  proof  that  a'  is  (iG!,/>^i(3)-periodic 
and  admissible  for  the  input  cyclic  task  Z  is  a  straightforward  generalization  of  the  proof  in 
theorem  1.  The  overall  time  to  generate  a'  is  0(n  hiIq  +  T). 
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5     Approximating  Optimality  for  Finite  Processor  Machines 

When  the  number  of  available  processors  p  is  finite  even  the  problem  of  generating  periodic 
schedules  whose  performance  is  less  than  a  factor  4/3  from  the  asymptotic  optimum  becomes 
NP-hard.  This  can  be  seen  as  follows.  Let  T  be  some  acyclic  task  system.  Create  the  cyclic 
task  system  L  where  each  iteration  has  the  same  operations  and  dependences  as  the  task 
system  T.  In  addition  each  iteration  contains  a  special  serializing  operation  opo  which  depends 
on  all  other  operations  in  the  iteration  and  on  which  every  operation  in  the  next  iteration 
depends.  Let  \\L\\p  denote  the  optimum  asymptotic  performance  of  L  when  p  processors  are 
available  and  assume  one  could  create  in  polynomial  time  a  periodic  schedule  a,  admissible  for 
L,  such  that  ||cr||/||L||p  <  4/3.  Then  by  taking  the  starting  time  of  the  operations  scheduled 
before  opo[0]  it  would  be  possible  to  generate,  in  polynomial  time,  a  schedule  for  the  finite 
task  system  T  which  has  a  length  less  than  a  factor  4/3  from  the  p-optimum,  and  this,  as  we 
mentioned  at  the  end  of  section  2  is  possible  only  if  P  =  NP  [19]. 

As  the  previous  reduction  shows  every  polynomial  time  approximation  algorithm  for  the 
cyclic  scheduling  problem  can  be  used  to  approximate  optimality  in  the  acyclic  case.  The  best 
such  algorithm  for  the  acyclic  case  guarantees  a  factor  of  (2  —  1/p)  from  p-optimality  [5].  Given 
a  cyclic  task  system  L,  the  goal  of  this  section  will  therefore  be  that  of  generating  a  periodic 
p-schedule  cr,  admissible  for  L,  such  that  ||cr||/||L||p  is,  in  the  worst  case,  as  close  as  possible 
to  (2 -1/p). 

The  algorithm  which  follows  is  simple,  runs  in  0{T)  time  and  guarantees  a  maximum  factor 
from  asymptotic  optimality  of  (2  -  1/p)  +  (p  -  l)/p  •  <5max/||^||p-  A  better  performance  bound 
of  (2  -  1/p)  +  (p  —  l)/p  •  ^max/C  ll^llp)?  where  /  is  a  user  selected  parameter,  can  be  obtained 
with  an  additional  0{n'mP)  cost  in  time  complexity.  If  operation  durations  are  integral  and 
the  generated  schedule  is  also  required  to  be  integral  a  slight  modification  of  the  algorithm 
guarantees  a  performance  bound  of  H-(p-  l)/p-((<5„oj.  - 1)+  [||I||p]  )/||^||p  times  the  optimum. 

The  strategy  adopted  is  to  transform  the  input  dependence  graph  G  into  an  acyclic  depen- 
dence graph  G'  and  invoke  a  list  scheduling  algorithm  on  G'  to  construct  the  periodic  schedule. 
The  dependence  graph  G'  is  obtained  by  deleting  edges  from  G.  The  difficult  part  when  cutting 
edges  is  to  shorten  dependence  paths  as  much  as  possible  while  preserving  admissibility. 

Algorithm  2 
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Input:  A  cyclic  task  system  L  represented  by  a  dependence  graph  G  —  {0,E,6,d)  and  a 
number  of  processors  p  <  oo. 

Output:  An  admissible   periodic  rational  p-schedule  a  for  L  with  unfolding   1  such  that 

M/\\L\\^  <  (2  -  1/p)  +  (P  -   l)/p  •  SmaJUW,. 

Method: 

1.  Do  steps  1  and  2  of  algorithm  1. 

2.  For  any  two  rational  numbers  A,/i  define  (A  mod  /z)  =  A  -  [A//iJ  •//.  Delete  an  edge 
e  =  {op,  op')  in  G  if  and  only  if 

t(s,  op)  mod  pG  <  t{s,  op)  mod  pa  +  S{op) 

The  resulting  graph  G'  is  acyclic  (see  lemma  1).  This  step  takes  0(m)  time. 

3.  Using  any  list  scheduling  algorithm  generate  a  /^schedule  CTq  admissible  for  the 
acyclic  task  system  represented  by  the  dependence  graph  6".  This  step  takes  0{Tn  + 
n  logn)  time. 

4.  For  all  op  e  O  and  i  >  0  set  a(op[i])  =  (Ta{op)  +  \(7a\  ■  (i  +  [r(s,op)/pGj)- 

The  overall  algorithm  requires  0{m  +  n  log  n+T)  time  if  Karp's  or  Young,  Tarjan  &  Orlin's 
algorithms  are  used  to  compute  pQ  and  otherwise  0{n  m  +  T)  time. 

The  first  step  in  proving  the  correctness  of  algorithm  2  is  to  show  that  schedule  cTq  is  weU 
defined,  that  is  G'  is  acyclic.  As  a  side  result  we  bound  the  length  of  the  longest  path  in  G'. 

Lemma  1    The  graph  G'  obtained  in  step  2  of  algorithm  2  is  acyclic.  Furthermore 

max      b{P)  <  PG  +  ^max 
P  path 

ofG' 

Proof:  Let  C  be  a  cycle  in  G  comprising  edges  (opi,  0P2), . . .,  [opc,  op\).  Suppose  that  no  edge 
is  deleted  from  C .  Then 

V  1  <  i  <  c      r(5, opi)  mod  pa  +  6{opi)  <  t{s, opi+imodc)  mod  pc 

Thus  (r(s,opi)  mod  pa)  +  ^{C)  <  {t(s,op\)  mod  pc),  a  contradiction. 
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For  the  second  claim  let  P  be  a  path  of  6"  comprising  edges  (opi ,  0^2),  •  •  •,  (opc-i ,  opc)  and 
suppose  that  pc  <  J^IZl  H^Pi)-  Then  because  of  the  previous  inequalities  one  can  write 

c-l 

PG  <  T{s,opi)  mod  pG  +  ^6(op,)  <  T{s,opc)  mod  po 

i=l 

a  contradiction.  Thus  S{P)  <  pc  -^  ^max-  ^ 

The  previous  lemma  shows  that  algorithm  2  does  indeed  unambiguously  generate  a  periodic 
schedule.  It  remains  to  prove  its  admissibility  for  the  input  cyclic  task  system  L. 

Theorem  3  For  any  cyclic  task  L  and  any  number  of  processors  p,  the  output  of  algorithm  2 
is  a  periodic  p-schedule  admissible  for  L. 

Proof:  The  schedule  a  generated  by  algorithm  2  is  clearly  periodic.  Furthermore  no  more 
than  p  operations  are  being  processed  at  any  given  moment  because  the  acyclic  schedule  aa 
created  in  step  3  is  a  /^schedule  and  <t's  initiation  interval  is  |<Ta|.  It  remains  to  show  that  a 
is  admissible  for  L.  This  is  true  if  and  only  if 

V  e  =  (op,  op')  e  E    V  i  >  0      (T(op[i\)  +  6{op)  <  a{op'[i  +  d{e)]) 

Because  of  the  definition  of  a  the  previous  inequality  can  be  rewritten  as 


(Taiop)+  \aa\ 


r{s,op) 
PG 


+  S{op)  <  a,{op')  +  |a,|  •  {d{e)  +     "^^^^  \ 

V  I       PG       \J 


As  in  theorem  1  the  proof  that  the  above  inequality  holds  relies  on  the  longest  paths  inequality 

T{s,op)  +  6{op)  -  PG  ■  d{e)  <  T{s,op') 

There  are  two  cases  two  consider  depending  on  whether  edge  e  =  (op,  op')  has  been  deleted  by 
algorithm  2  in  step  2.  If  e  has  not  been  deleted  then 

(^a{op)  +  S{op)  <  Oaiop') 


furthermore  by  dividing  the  longest  path  inequality  by  pc  and  taking  the  floor  one  derives  the 
following  inequality: 

7-^ «    n'n\ 

<d(e)  + 


T{s,Op) 


PG 


T{s,Op') 


PG 
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and  therefore  if  e  has  not  been  deleted  the  dependence  constraint  is  respected  in  a.  If  on  the 
contrary  edge  e  has  been  deleted  in  step  2  then  it  must  be  that 

T{s,op')  mod  PC  <  T{s,op)  mod  pc  +  ^(op) 


Thus  by  rewriting  the  longest  path  inequality  as: 

r{s,op) 


PG      . 
one  derives 


PG  +  t{s,  op)  mod  PG  +  S(op)  -  PG  ■  d{e)  < 


-{s,op') 


PG 


pG  +  t(6,  op')  mod  pG 


TJs^op) 

PG 


<d{e)  + 


T{s,0p') 


PG 


and  therefore 


<^a(op)  +  S{op)  -  (Ta{op')  <  \cra\  <  W^l  •  id{e)  + 


T(s,Op') 


PG 


T{s,Op) 
PG 


which  shows  that  the  dependence  constraint  is  as  well  respected  in  cr  if  e  =  {op,  op')  has  been 
deleted.  D 


It  remains  to  bound  the  performance  of  a. 

Theorem  4  For  any  cyclic  task  L  and  any  number  of  processors  p,  the  p-schedule  a  generated 
by  algorithm  2  is  such  that 

Ikll     ,  2-  i  +  P~  ^  .  ^'""^ 


Proof:  The  optimum  asymptotic  performance  ||Z,||p  is  clearly  bounded  by  the  duration  of 
operations  in  0,  the  number  of  available  processors  p  and  pc- 

I  ■  E  ^(«p)  ^  II^IIp      PG  <  UWp 

Let  <i>  denote  the  overall  time  in  CTq  when  no  more  than  p  —  1  processors  are  busy.  Because  CTq 
is  generated  by  a  list  scheduling  algorithm  there  must  exist  a  dependence  path  P  in  G'  such 
that  <f>  <  6{P).  Furthermore  lemma  1  implies 

<t><  PG  +  <5max  <  ^mox  +  ||-^||p 
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Thus 


p-kal     <      Yl  r{op)  +  ip  -  I)  ■  4> 
opeo 

<    p-\\L\\,  +  {p-l)-{6^a.  +  \\L\\,) 


<     2--  +  ^"^     ^'"" 


lli^llp  P         P        U\ 


Note  that  if  ||Z||p  is  small  and  S^ax  is  big,  the  previous  performance  bound  may  be  poor.  To 
improve  it,  it  suffices  to  unroll  G  until  a  satisfactory  bound  can  be  guaranteed.  More  precisely 
if  for  some  /  >  1,  algorithm  2  is  to  operate  on  G'  rather  the  G  the  bound  on  performance 
becomes 

1*^1      <2-  i  +  ^~  ^  ^mar 


ll^llp  P         P        l-\\L\\p 

This  improved  bound  comes  at  an  additional  0{nml'^)  cost  in  running  time  complexity. 

If  operation  durations  are  integral  and  the  generated  schedule  is  also  required  to  be  integral, 

algorithm  2  continues  to  work  if,  as  in  section  4,  one  replaces  pa  with  \pg\-    Let  a'  be  the 

integral  schedule  generated.  The  bound  on  optimum  performance  becomes 


I^IIp     -  P  \\L\\v 


14 


References 

[1]  R.  K.  Ahuja  and  J.  B.  Orlin,  New  Scaling  Algorithms  for  Assignment  and  Minimum 
Cycle  Mean  Problems,  Tech.  Rep.  Sloan  Working  Paper  2019-88,  M.I.T.,  1988. 

[2]  F.  Allen,  M.  Burke,  P.  Charles,  R.  Cytron,  and  J.  Ferrante,  An  overview 
of  the  PTRAN  analysis  system  for  multiprocessing,  Journal  of  Parallel  and  Distributed 
Computing,  5  (1988),  pp.  617-640. 

[3]  R.  Allen  and  K.  Kennedy,  Automatic  translation  of  Fortran  programs  to  vector  form, 
ACM  Transactions  on  Programming  Languages  and  Systems,  9  (1987),  pp.  491-542. 

[4]  P.  Chretienne,  The  basic  cyclic  scheduling  problem  with  deadlines.  Discrete  Applied 
Mathematics,  30  (1991),  pp.  109-123. 

[5]  E.  G.  CoFFMAN,  Computer  and  Job-shop  Scheduling  Theory,  John  Wiley  and  Sons,  New 
York,  New  York,  1976. 

[6]  E.  G.  CoFFMAN  AND  R.  L.  Graham,  Optimal  scheduling  for  two  processor  systems, 
Acta  Informatica,  1  (1972),  pp.  200-213. 

[7]  T.  H.  CoRMEN,  C.  E.  Leiserson,  and  R.  L.  Rivest,  Introduction  to  Algorithms,  MIT 
Press  and  Mc  Craw  Hill,  1990. 

[8]  R.  Cytron,  Doacross:  beyond  vectorization  for  multiprocessors,  in  Proceedings  of  the 
1985  International  Conference  on  Parallel  Processing  (Penn  State  University,  Pennsylva- 
nia), IEEE  and  ACM,  Silver  Spring,  Maryland,  Aug.  1986,  pp.  836-844. 

[9]  H.  N.  Gabow,  An  almost-linear  algorithm  for  two-processor  scheduling.  Journal  of  the 
ACM,  29  (1982),  pp.  766-780. 

[10]  M.  R.  Garey  and  D.  S.  Johnson,  Computers  and  Intractability  -  A  Guide  to  the 
Theory  of  NP-Completeness,  Freeman,  New  York,  New  York,  1979. 

[11]  M.  GONDRAN  and  M.  Minoux,  Graphs  and  Algorithms,  Wiley,  1984. 

[12]  R.  L.  Graham,  E.  L.  Lawler,  J.  K.  Lenstra,  and  A.  H.  G.  Rinnooy  Kan,  Op- 
timization and  Approximation  in  Deterministic  Sequencing  and  Scheduling:  A  Survey, 
vol.  5  of  Annals  of  Discrete  Mathematics,  North  Holland  Publishing  Company,  1979, 
pp.  287-326. 

[13]  N.  S.  Grigor'yeva,  I.  S.  Latypov,  and  I.  V.  Romanovskii,  Cyclic  problems  of 
scheduling  theory,  Tekhnicheskaya  Kibernetika,  (1988),  pp.  3-11.  English  translation. 


15 


[14]  K.  IWANO  AND  S.  Yeh,  An  efficient  algorithm  for  optimal  loop  parallelization,  in  Inter- 
national Symposyum  on  Algorithms,  Springer- Verlag,  Aug.  1990,  pp.  201-210.  Lecture 
Notes  in  Computer  Science  450. 

[15]  R.  M.  Karp,  a  Characterization  of  the  Minimum  Cycle  Mean  in  a  Digraph,  vol.  23  of 
Discrete  Mathematics,  North  Holland  Publishing  Company,  1978,  pp.  309-311. 

[16]  M.  Lam,  Software  pipelining:  an  effective  scheduling  technique  for  VLIW  machines,  in 
Proceedings  of  the  SIGPLAN  1988  Conference  on  Programming  Language  Design  and 
Implementation  (Atlanta,  Georgia),  ACM,  June  1988,  pp.  318-328. 

[17]  L.  Lamport,  The  parallel  execution  of  DO  loops,  Communications  of  the  ACM,  17  (1974), 
pp.  83-93. 

[18]  E.  L.  Lawler,  Optimal  cycles  in  doubly  weighted  directed  linear  graphs,  in  Theory  of 
Graphs-International  Symposyum,  P.  Rosenstiehl,  ed.,  Gordon  and  Breach,  Rome  1966, 
pp.  209-213. 

[19]  J.  K.  Lenstra  and  A.  H.  G.  Rinnooy  Kan,  Complexity  of  scheduling  under  precedence 
constraints.  Operations  Research,  26  (1978),  pp.  22-35. 

[20]  A.  MuNSHl  and  B.  Simons,  Scheduling  loops  on  processors:  algorithms  and  complexity, 
SIAM  Journal  of  Computing,  19  (1990),  pp.  728-741. 

[21]  D.  A.  Padua  and  M.  J.  Wolfe,  Advanced  compiler  optimizations  for  supercomputers, 
Communications  of  the  ACM,  29  (1986),  pp.  1184-1201. 

[22]  R.  Reiter,  Scheduling  parallel  computations.  Journal  of  the  ACM,  15  (1968),  pp.  590-599. 

[23]  N.  E.  Young,  R.  E.  Tarjan,  and  J.  B.  Orlin,  Faster  parametric  shortest  path  and 
minimum- balance  algorithms.  Networks,  21  (1991),  pp.  205-221. 


16 


*  \^t.   10.  U  V^  J. 


Gasperoni,  Franco 
Efficient  algorithms  for 
cyclic  scheduling. 


.NYU  COMPSCI  TR-571      c 
Gasperoni,  Franco 
.Efficient  algorithms  for 
cyclic  scheduling. 


DATE  DUE 


BORROWER'S  NAME 


LIBRARY 

N.Y.U.  Courant  Institute  of 

Mathematical  Sciences 

251  Mercer  St. 
Mow  York,  N.  Y.    10012 

This  book  may  be  kept 

FOURTEEN    DAYS 


